Overview

Dataset statistics

Number of variables12
Number of observations2969
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory278.5 KiB
Average record size in memory96.0 B

Variable types

Numeric12

Warnings

gross_revenue is highly correlated with qtde_itemsHigh correlation
qtde_items is highly correlated with gross_revenueHigh correlation
avg_ticket is highly correlated with qtde_returns and 1 other fieldsHigh correlation
qtde_returns is highly correlated with avg_ticket and 1 other fieldsHigh correlation
avg_basket_size is highly correlated with avg_ticket and 1 other fieldsHigh correlation
avg_ticket is highly skewed (γ1 = 53.44422362) Skewed
frequency is highly skewed (γ1 = 24.88049136) Skewed
qtde_returns is highly skewed (γ1 = 51.79774426) Skewed
avg_basket_size is highly skewed (γ1 = 44.67271661) Skewed
df_index has unique values Unique
customer_id has unique values Unique
avg_ticket has unique values Unique
recency_days has 34 (1.1%) zeros Zeros
qtde_returns has 1481 (49.9%) zeros Zeros

Reproduction

Analysis started2021-10-19 11:55:07.355658
Analysis finished2021-10-19 11:55:28.494853
Duration21.14 seconds
Software versionpandas-profiling v2.12.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct2969
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2317.292354
Minimum0
Maximum5715
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:28.600469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile185.4
Q1929
median2120
Q33537
95-th percentile5035.2
Maximum5715
Range5715
Interquartile range (IQR)2608

Descriptive statistics

Standard deviation1554.944589
Coefficient of variation (CV)0.6710178739
Kurtosis-1.010787014
Mean2317.292354
Median Absolute Deviation (MAD)1271
Skewness0.342284058
Sum6880041
Variance2417852.674
MonotocityStrictly increasing
2021-10-19T08:55:28.731752image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
42641
 
< 0.1%
12621
 
< 0.1%
33231
 
< 0.1%
12741
 
< 0.1%
44241
 
< 0.1%
12721
 
< 0.1%
53661
 
< 0.1%
47401
 
< 0.1%
53641
 
< 0.1%
33151
 
< 0.1%
Other values (2959)2959
99.7%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
ValueCountFrequency (%)
57151
< 0.1%
56961
< 0.1%
56861
< 0.1%
56801
< 0.1%
56591
< 0.1%

customer_id
Real number (ℝ≥0)

UNIQUE

Distinct2969
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15270.77299
Minimum12347
Maximum18287
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:28.871816image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum12347
5-th percentile12619.4
Q113799
median15221
Q316768
95-th percentile17964.6
Maximum18287
Range5940
Interquartile range (IQR)2969

Descriptive statistics

Standard deviation1718.990292
Coefficient of variation (CV)0.1125673398
Kurtosis-1.206094692
Mean15270.77299
Median Absolute Deviation (MAD)1488
Skewness0.03160785866
Sum45338925
Variance2954927.624
MonotocityNot monotonic
2021-10-19T08:55:28.998977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
143351
 
< 0.1%
156791
 
< 0.1%
156891
 
< 0.1%
177361
 
< 0.1%
156871
 
< 0.1%
177341
 
< 0.1%
136361
 
< 0.1%
127221
 
< 0.1%
136341
 
< 0.1%
156811
 
< 0.1%
Other values (2959)2959
99.7%
ValueCountFrequency (%)
123471
< 0.1%
123481
< 0.1%
123521
< 0.1%
123561
< 0.1%
123581
< 0.1%
ValueCountFrequency (%)
182871
< 0.1%
182831
< 0.1%
182821
< 0.1%
182771
< 0.1%
182761
< 0.1%

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION

Distinct2963
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2749.321711
Minimum6.2
Maximum279138.02
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:29.137862image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum6.2
5-th percentile229.77
Q1570.96
median1086.92
Q32308.06
95-th percentile7219.68
Maximum279138.02
Range279131.82
Interquartile range (IQR)1737.1

Descriptive statistics

Standard deviation10580.62331
Coefficient of variation (CV)3.848448607
Kurtosis353.944724
Mean2749.321711
Median Absolute Deviation (MAD)672.16
Skewness16.77755612
Sum8162736.16
Variance111949589.6
MonotocityNot monotonic
2021-10-19T08:55:29.265517image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
533.332
 
0.1%
731.92
 
0.1%
379.652
 
0.1%
745.062
 
0.1%
734.942
 
0.1%
3312
 
0.1%
1842.141
 
< 0.1%
899.631
 
< 0.1%
9430.521
 
< 0.1%
278.741
 
< 0.1%
Other values (2953)2953
99.5%
ValueCountFrequency (%)
6.21
< 0.1%
13.31
< 0.1%
151
< 0.1%
36.561
< 0.1%
451
< 0.1%
ValueCountFrequency (%)
279138.021
< 0.1%
259657.31
< 0.1%
194550.791
< 0.1%
168472.51
< 0.1%
140450.721
< 0.1%

recency_days
Real number (ℝ≥0)

ZEROS

Distinct272
Distinct (%)9.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.28763894
Minimum0
Maximum373
Zeros34
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:29.399743image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q111
median31
Q381
95-th percentile242
Maximum373
Range373
Interquartile range (IQR)70

Descriptive statistics

Standard deviation77.75677911
Coefficient of variation (CV)1.209513686
Kurtosis2.777962659
Mean64.28763894
Median Absolute Deviation (MAD)26
Skewness1.798379538
Sum190870
Variance6046.116697
MonotocityNot monotonic
2021-10-19T08:55:29.528454image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
199
 
3.3%
487
 
2.9%
385
 
2.9%
285
 
2.9%
876
 
2.6%
1067
 
2.3%
966
 
2.2%
766
 
2.2%
1764
 
2.2%
1655
 
1.9%
Other values (262)2219
74.7%
ValueCountFrequency (%)
034
 
1.1%
199
3.3%
285
2.9%
385
2.9%
487
2.9%
ValueCountFrequency (%)
3732
0.1%
3724
0.1%
3711
 
< 0.1%
3681
 
< 0.1%
3664
0.1%

qtde_invoices
Real number (ℝ≥0)

Distinct56
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.723139104
Minimum1
Maximum206
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:29.669916image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile17
Maximum206
Range205
Interquartile range (IQR)4

Descriptive statistics

Standard deviation8.85653132
Coefficient of variation (CV)1.547495379
Kurtosis190.8344494
Mean5.723139104
Median Absolute Deviation (MAD)2
Skewness10.76680458
Sum16992
Variance78.43814702
MonotocityNot monotonic
2021-10-19T08:55:29.800343image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2785
26.4%
3499
16.8%
4393
13.2%
5237
 
8.0%
1190
 
6.4%
6173
 
5.8%
7138
 
4.6%
898
 
3.3%
969
 
2.3%
1055
 
1.9%
Other values (46)332
11.2%
ValueCountFrequency (%)
1190
 
6.4%
2785
26.4%
3499
16.8%
4393
13.2%
5237
 
8.0%
ValueCountFrequency (%)
2061
< 0.1%
1991
< 0.1%
1241
< 0.1%
971
< 0.1%
912
0.1%

qtde_items
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1671
Distinct (%)56.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1608.852476
Minimum1
Maximum196844
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:29.938975image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile102.4
Q1296
median641
Q31401
95-th percentile4407.4
Maximum196844
Range196843
Interquartile range (IQR)1105

Descriptive statistics

Standard deviation5887.578045
Coefficient of variation (CV)3.659489067
Kurtosis465.998084
Mean1608.852476
Median Absolute Deviation (MAD)422
Skewness17.85859125
Sum4776683
Variance34663575.24
MonotocityNot monotonic
2021-10-19T08:55:30.069786image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31011
 
0.4%
889
 
0.3%
1509
 
0.3%
2888
 
0.3%
2728
 
0.3%
2468
 
0.3%
2608
 
0.3%
848
 
0.3%
3007
 
0.2%
1147
 
0.2%
Other values (1661)2886
97.2%
ValueCountFrequency (%)
11
< 0.1%
22
0.1%
122
0.1%
161
< 0.1%
171
< 0.1%
ValueCountFrequency (%)
1968441
< 0.1%
809971
< 0.1%
802631
< 0.1%
773731
< 0.1%
699931
< 0.1%

avg_ticket
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
UNIQUE

Distinct2969
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51.89776151
Minimum2.150588235
Maximum56157.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:30.212151image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2.150588235
5-th percentile4.916661099
Q113.11933333
median17.95658654
Q324.98828571
95-th percentile90.497
Maximum56157.5
Range56155.34941
Interquartile range (IQR)11.86895238

Descriptive statistics

Standard deviation1036.934407
Coefficient of variation (CV)19.98033011
Kurtosis2890.707126
Mean51.89776151
Median Absolute Deviation (MAD)5.984842033
Skewness53.44422362
Sum154084.4539
Variance1075232.964
MonotocityNot monotonic
2021-10-19T08:55:30.334781image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18.895833331
 
< 0.1%
12.279333331
 
< 0.1%
18.841714291
 
< 0.1%
26.799064331
 
< 0.1%
25.696354171
 
< 0.1%
20.434210531
 
< 0.1%
20.636686751
 
< 0.1%
16.837560981
 
< 0.1%
20.23662421
 
< 0.1%
13.114444441
 
< 0.1%
Other values (2959)2959
99.7%
ValueCountFrequency (%)
2.1505882351
< 0.1%
2.43251
< 0.1%
2.4623711341
< 0.1%
2.5112413791
< 0.1%
2.5153333331
< 0.1%
ValueCountFrequency (%)
56157.51
< 0.1%
4453.431
< 0.1%
3202.921
< 0.1%
1687.21
< 0.1%
952.98751
< 0.1%

avg_recency_days
Real number (ℝ≥0)

Distinct1258
Distinct (%)42.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean67.34851138
Minimum1
Maximum366
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:30.468181image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q125.92307692
median48.28571429
Q385.33333333
95-th percentile201
Maximum366
Range365
Interquartile range (IQR)59.41025641

Descriptive statistics

Standard deviation63.54492876
Coefficient of variation (CV)0.9435238799
Kurtosis4.887109087
Mean67.34851138
Median Absolute Deviation (MAD)26.28571429
Skewness2.062770925
Sum199957.7303
Variance4037.957972
MonotocityNot monotonic
2021-10-19T08:55:30.594939image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1425
 
0.8%
422
 
0.7%
7021
 
0.7%
720
 
0.7%
3519
 
0.6%
4918
 
0.6%
2117
 
0.6%
4617
 
0.6%
1117
 
0.6%
516
 
0.5%
Other values (1248)2777
93.5%
ValueCountFrequency (%)
116
0.5%
1.51
 
< 0.1%
213
0.4%
2.51
 
< 0.1%
2.6013986011
 
< 0.1%
ValueCountFrequency (%)
3661
< 0.1%
3651
< 0.1%
3631
< 0.1%
3621
< 0.1%
3572
0.1%

frequency
Real number (ℝ≥0)

SKEWED

Distinct1225
Distinct (%)41.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1137973039
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:30.759542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.008894164194
Q10.01633986928
median0.02588996764
Q30.04945054945
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.03311068017

Descriptive statistics

Standard deviation0.4081562524
Coefficient of variation (CV)3.586695275
Kurtosis989.3650758
Mean0.1137973039
Median Absolute Deviation (MAD)0.0121913375
Skewness24.88049136
Sum337.8641954
Variance0.1665915263
MonotocityNot monotonic
2021-10-19T08:55:30.890265image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1198
 
6.7%
0.062518
 
0.6%
0.0277777777817
 
0.6%
0.0238095238116
 
0.5%
0.0833333333315
 
0.5%
0.0909090909115
 
0.5%
0.0294117647114
 
0.5%
0.0344827586214
 
0.5%
0.0192307692313
 
0.4%
0.0256410256413
 
0.4%
Other values (1215)2636
88.8%
ValueCountFrequency (%)
0.0054495912811
< 0.1%
0.0054644808741
< 0.1%
0.0054794520551
< 0.1%
0.0054945054951
< 0.1%
0.0055865921792
0.1%
ValueCountFrequency (%)
171
 
< 0.1%
31
 
< 0.1%
26
 
0.2%
1.1428571431
 
< 0.1%
1198
6.7%

qtde_returns
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct214
Distinct (%)7.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean62.1569552
Minimum0
Maximum80995
Zeros1481
Zeros (%)49.9%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:31.035872image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q39
95-th percentile100.6
Maximum80995
Range80995
Interquartile range (IQR)9

Descriptive statistics

Standard deviation1512.496135
Coefficient of variation (CV)24.33349783
Kurtosis2765.52864
Mean62.1569552
Median Absolute Deviation (MAD)1
Skewness51.79774426
Sum184544
Variance2287644.557
MonotocityNot monotonic
2021-10-19T08:55:31.459747image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01481
49.9%
1164
 
5.5%
2148
 
5.0%
3105
 
3.5%
489
 
3.0%
678
 
2.6%
561
 
2.1%
1251
 
1.7%
743
 
1.4%
843
 
1.4%
Other values (204)706
23.8%
ValueCountFrequency (%)
01481
49.9%
1164
 
5.5%
2148
 
5.0%
3105
 
3.5%
489
 
3.0%
ValueCountFrequency (%)
809951
< 0.1%
90141
< 0.1%
80041
< 0.1%
44271
< 0.1%
37681
< 0.1%

avg_basket_size
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct1979
Distinct (%)66.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean249.8137641
Minimum1
Maximum40498.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:31.603806image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile44
Q1103.25
median172.3333333
Q3281.6923077
95-th percentile600
Maximum40498.5
Range40497.5
Interquartile range (IQR)178.4423077

Descriptive statistics

Standard deviation791.5551894
Coefficient of variation (CV)3.168581172
Kurtosis2255.538236
Mean249.8137641
Median Absolute Deviation (MAD)83.08333333
Skewness44.67271661
Sum741697.0657
Variance626559.6179
MonotocityNot monotonic
2021-10-19T08:55:31.743471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10011
 
0.4%
11410
 
0.3%
829
 
0.3%
869
 
0.3%
739
 
0.3%
1368
 
0.3%
608
 
0.3%
758
 
0.3%
888
 
0.3%
2887
 
0.2%
Other values (1969)2882
97.1%
ValueCountFrequency (%)
12
0.1%
21
< 0.1%
3.3333333331
< 0.1%
5.3333333331
< 0.1%
5.6666666671
< 0.1%
ValueCountFrequency (%)
40498.51
< 0.1%
6009.3333331
< 0.1%
42821
< 0.1%
39061
< 0.1%
3868.651
< 0.1%

avg_unique_basket_size
Real number (ℝ≥0)

Distinct906
Distinct (%)30.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.48459137
Minimum0.2
Maximum259
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.3 KiB
2021-10-19T08:55:31.884024image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile2
Q17.666666667
median13.6
Q322.14285714
95-th percentile46
Maximum259
Range258.8
Interquartile range (IQR)14.47619048

Descriptive statistics

Standard deviation15.46030748
Coefficient of variation (CV)0.8842246955
Kurtosis29.31744084
Mean17.48459137
Median Absolute Deviation (MAD)6.6
Skewness3.43586152
Sum51911.75179
Variance239.0211074
MonotocityNot monotonic
2021-10-19T08:55:32.017837image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1342
 
1.4%
941
 
1.4%
839
 
1.3%
1639
 
1.3%
1438
 
1.3%
1738
 
1.3%
1136
 
1.2%
536
 
1.2%
736
 
1.2%
1535
 
1.2%
Other values (896)2589
87.2%
ValueCountFrequency (%)
0.21
 
< 0.1%
0.253
0.1%
0.33333333336
0.2%
0.41
 
< 0.1%
0.40909090911
 
< 0.1%
ValueCountFrequency (%)
2591
< 0.1%
1771
< 0.1%
1481
< 0.1%
1271
< 0.1%
1051
< 0.1%

Interactions

2021-10-19T08:55:11.339875image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:11.471372image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:11.598995image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:11.718704image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:11.829380image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:11.957037image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:12.085057image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:12.209232image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:12.332995image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:12.460695image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:12.596469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:12.729289image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:12.850965image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:12.964701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:13.083925image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:13.192606image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:13.320256image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:13.440978image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:13.567531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:13.687212image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:13.817823image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:13.940496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:14.054817image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:14.174519image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:14.287254image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:14.408891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:14.513654image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:14.639471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:14.755688image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:14.908292image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:15.048915image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:15.211497image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:15.356716image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:15.681318image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:15.827436image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:15.971113image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:16.098816image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:16.221444image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:16.356639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:16.482287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:16.619061image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:16.754684image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:16.921861image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:17.069429image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:17.188148image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:17.297637image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:17.404351image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:17.510294image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:17.622265image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:17.737376image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:17.844735image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:17.953437image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:18.064282image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:18.179839image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:18.298603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:18.400305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:18.525969image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:18.646683image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:18.769357image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:18.898010image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:19.015697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:19.138371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:19.267028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:19.392651image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:19.524299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:19.660239image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:19.777958image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:19.887667image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:19.997376image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:20.106087image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:20.219779image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:20.324511image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:20.445138image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:20.558851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:20.672568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:20.792209image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:20.919415image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:21.032110image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:21.157738image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:21.282445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:21.403082image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:21.527790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:21.641459image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:21.769104image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:21.885792image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:22.010459image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:22.143670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:22.273319image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:22.391096image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:22.507055image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:22.624740image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:22.740431image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:22.863583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:22.975264image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:23.101288image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:23.218384image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:23.343341image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:23.471338image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:23.818751image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:23.939750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:24.062793image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:24.184579image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:24.306253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:24.434416image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:24.550108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:24.680758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:24.801435image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:24.929960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:25.075565image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:25.218223image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:25.337903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:25.461046image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:25.586081image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:25.710480image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:25.839132image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:25.962841image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:26.110409image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:26.233270image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:26.360433image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:26.489142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:26.627771image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:26.750516image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:26.862719image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:26.972012image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:27.081206image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:27.194861image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:27.297624image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:27.422577image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:27.530328image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:27.646015image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:27.761407image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-10-19T08:55:27.881295image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-10-19T08:55:32.141527image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-19T08:55:32.365292image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-19T08:55:32.583418image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-19T08:55:32.807531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-19T08:55:28.100002image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-19T08:55:28.367198image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcustomer_idgross_revenuerecency_daysqtde_invoicesqtde_itemsavg_ticketavg_recency_daysfrequencyqtde_returnsavg_basket_sizeavg_unique_basket_size
00178505391.21372.034.01733.018.15222235.50000017.00000040.050.9705880.617647
11130473232.5956.09.01390.018.90403527.2500000.02830235.0154.44444411.666667
22125836705.382.015.05028.028.90250023.1875000.04032350.0335.2000007.600000
3313748948.2595.05.0439.033.86607192.6666670.0179210.087.8000004.800000
4415100876.00333.03.080.0292.0000008.6000000.07317122.026.6666670.333333
55152914623.3025.014.02102.045.32647123.2000000.04011529.0150.1428574.357143
66146885630.877.021.03621.017.21978618.3000000.057221399.0172.4285717.047619
77178095411.9116.012.02057.088.71983635.7000000.03352041.0171.4166673.833333
881531160767.900.091.038194.025.5434644.1444440.243316474.0419.7142866.230769
99160982005.6387.07.0613.029.93477647.6666670.0243900.087.5714294.857143

Last rows

df_indexcustomer_idgross_revenuerecency_daysqtde_invoicesqtde_itemsavg_ticketavg_recency_daysfrequencyqtde_returnsavg_basket_sizeavg_unique_basket_size
29595627177271060.2515.01.0645.016.0643946.01.0000006.0645.00000066.000000
2960563717232421.522.02.0203.011.70888912.00.1538460.0101.50000015.000000
2961563817468137.0010.02.0116.027.4000004.00.4000000.058.0000002.500000
2962564913596697.045.02.0406.04.1990367.00.2500000.0203.00000066.500000
29635655148931237.859.02.0799.016.9568492.00.6666670.0399.50000036.000000
2964565912479473.2011.01.0382.015.7733334.01.00000034.0382.00000030.000000
2965568014126706.137.03.0508.047.0753333.00.75000050.0169.3333334.666667
29665686135211092.391.03.0733.02.5112414.50.3000000.0244.333333104.000000
2967569615060301.848.04.0262.02.5153331.02.0000000.065.50000020.000000
2968571512558269.967.01.0196.024.5418186.01.000000196.0196.00000011.000000